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Abstract 

Background: This paper considers automatic segmentation of the left cardiac ventricle in short axis magnetic 
resonance images. Various aspects, such as the presence of papillary muscles near the endocardium border, makes 
simple threshold based segmentation difficult. 

Methods: The endo- and epicardium are modelled as two series of radii which are inter-related using features 
describing shape and motion. Image features are derived from edge information from human annotated images. The 
features are combined within a discriminatively trained Conditional Random Field (CRF). Loopy belief propagation is 
used to infer segmentations when an unsegmented video sequence is given. Powell's method is applied to find CRF 
parameters by minimizing the difference between ground truth annotations and the inferred contours. We also 
describe how the endocardium centre points are calculated from a single human-provided centre point in the first 
frame, through minimization of frame alignment error. 

Results: We present and analyse the results of segmentation. The algorithm exhibits robustness against inclusion of 
the papillary muscles by integrating shape and motion information. Possible future improvements are identified. 

Conclusions: The presented model integrates shape and motion information to segment the inner and outer 
contours in the presence of papillary muscles. On the Sunnybrook dataset we find an average Dice metric of 
0.91 ± 0.02 and 0.93 ± 0.02 for the inner and outer segmentations, respectively. Particularly problematic are patients 
with hypertrophy where the blood pool disappears from view at end-systole. 



Background 

Properties of the cardiac left ventricle, such as volume, 
ejection fraction and wall thickness are important indi- 
cators for the diagnosis of many heart-related problems. 
Many of these are conveniently extracted from Mag- 
netic Resonance Imaging (MRI). Calculating these prop- 
erties requires accurate annotation of the left ventricle 
to isolate it from its surrounding structure. Figure 1 
illustrates the cardiac structure and human annotated 
inner and outer contours of the left ventricle in an 
MRI image. 
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Manual annotation is a tedious process and lacks 
consistency between human annotators [1,2] and even 
between separate annotations by the same annotator. 

Various properties of magnetic resonance images, such 
as intensity inhomogeneities, z-shift due to respiration 
and imaging acquisition artifacts, can make segmenta- 
tion difficult. One of the most severe problems arises 
from judging to what extent the papillary muscles influ- 
ence and, possibly, obscure the endocardium border. For 
research on the effects that discrepancies in annotations 
of the papillary muscles can have on the calculation of left 
ventricle function and mass see e.g. [2-4]. In this work we 
primarily focus on mitigating the effect of the papillary 
muscles. 

The examples in Figure 2 illustrate the presence of pap- 
illary muscles close to the endocardium border and a 
human annotators segmentation. When modelling the 
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Figure 1 MRI short axis view. MRI short axis view of ventricles with 


human annotated (inner and outer) contours shown in yellow. 


Surrounding tissue is omitted for illustrative purposes. 



structural properties of the ventricle wall, it is often desir- 
able to include these muscles inside the inner contour. 
From an inspection of the human annotations, it is clear 
that the presence of the endocardium border is inferred 
from information other than only a strong intensity gra- 
dient, possibly from prior knowledge of the motion and 
shape of the ventricle. Such considerations are difficult 
to integrate into a simple intensity threshold-based tech- 
niques of a single image, in particular when there is little 
difference between the intensity of the endocardium and 
surrounding structure. 

A number of automated techniques have been devel- 
oped for the segmentation of the left ventricle from its 
surrounding structure (see e.g. [5,6]). For a review of 
snakes and deformable models in medical image analy- 
sis see e.g. [7]. We will briefly discuss the most popular 
techniques. 

Active Shape Models (ASMs, [8]) track the edges in 
a video sequence by modelling the contour shape using 
methods such as Principal Component Analysis. This 
is often combined with a Kalman filter to incorporate 




Figure 2 Papillary muscles obscuring edge. Presence of papillary 
muscles obscure the edge of the inner contour due to its similar 
intensity to the cardiac wall. Human annotated inner and outer 
contours are shown in yellow. Surrounding tissue is cropped for clarity. 



temporal smoothing in an online tracking framework. 
Typically only past information is used and future images 
ignored, often with adverse consequences if the object 
being tracked disappears from view or becomes very 
small. 

Andreopoulos and Tsotsos [9] fit a 3D Active Appear- 
ance Model (AAM) and investigate a hierarchical "2D + 
time" ASM that integrates temporal constraints by using 
the third dimension for time instead of space. 

Generative models such as Markov Random Fields 
(MRF) are popular in pixel labelling and de-noising prob- 
lems [10]. Modelling the image probability can, however, 
lead to intractable models with complex dependencies 
between local features. This can lead to reduced perfor- 
mance if oversimplified [11]. Careful manual design of the 
probability distributions is therefore often necessary. 

Most image segmentation applications of MRFs also 
model the texture within a region and are constructed 
to favour spatially smooth regions. We refer to these 
models as Surface MRFs. Surface MRFs are often used 
to isolate homogeneous objects from their backgrounds. 
The left ventricle, however, contains papillary muscles 
(see Figures 1 and 2), rendering this approach less effec- 
tive. As surface MRFs do not model the edge explicitly, 
they do not directly encode any shape information. There 
have been attempts to unify models of the edge and sur- 
face: specifically, Huang et al. [12] propose coupling sur- 
face MRFs with a hidden state representing a deformable 
contour. 

Cordero-Grande et al. [13] construct an MRF model of 
the inner and outer contours, using a purely generative 
model. They use the Gibbs sampling algorithm to extract 
segmentations and parameters. 

Also of interest is the approach by Jolly [14], in which 
the segmentation problem is set in log-polar space where 
the least cost path in a single frame (calculated by dynamic 
programming) is defined as the desired contour. A cost 
function, which they relate to an initial labelling of blood 
pool pixels, is required to determine the correct contour. 
This is similar to our approach in that if we limit our 
model to a single frame (i.e. remove temporal linkage) 
belief propagation also reduces to estimating the least cost 
path via dynamic programming. 

In summary, a number of methods are available, with 
the main challenge being integration of temporal and 
spatial constraints. 

In an earlier study, Dreijer et al. [15] investigated mod- 
elling the endocardium edge using a semi-conditional 
MRF with mostly heuristically chosen features. Although 
this approach showed promise, practical experiments 
indicated a tendency to "snap" onto the epicardium, partly 
due to the epicardiums stronger edge with respect to sur- 
rounding tissue. In the present study, the outer contour is 
explicitly included in the model, establishing a statistical 
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dependency between the two contours. We also train 
discriminative feature functions from human annotated 
images as opposed to the heuristically chosen functions 
used previously. 

Methods 

Conditional random field 

A Conditional Random Field (CRF, [16]) models the con- 
ditional probability of a set of unobserved/latent variables, 
y, given a set of observations x, i.e. P (y\x). This is different 
from a generative Markov random field which models the 
probability distribution of the observed variables, given 
different configuration of the latent variables, i.e. P (x\y). 

A CRF represents the modelling probability as a prod- 
uct of local potential functions, defined over subsets of the 
latent variables, 



P(y\*) = \Y\i>c(y c ,x) 



(i) 



ccy 



where the normalizing partition function Z = ^ Y\ cCy 
ty c (y c , x) sums over all possible configurations of y. 

The main advantage of a CRF is its discriminative 
nature, i.e. it does not require a detailed model of the 
observed information, instead, computational resources 
are dedicated to describing the latent variables. 

Problem formulation 

We refer to the left ventricle endocardium as the inner 
contour and the left ventricle epicardium together with 
the right ventricles endocardium (bordering the septum) 
as the outer contour, see Figure 1. 

We are primarily interested in segmenting a video 
sequence of T grayscale images, 7(0), . . . ,/ (T— 1), that 
is synchronised with a single cardiac cycle so that the 
first image 1(0) is before systole (contraction) and the last 
image, I(T— 1), after diastole (relaxation). End-systole 
(maximum contraction) thus occurs in the middle of the 
video sequence at approximately I (T/2). We refer to the 



grayscale value of a pixel within a single image at the x, y 
coordinate p as I (t,p). 

Assuming an annular shape, an inner or outer contour 
in a frame at time t, can be represented by a sequence of 
N radii, r n (t), at uniformly spaced angles, n = 0, . . . , N— 1 
around a shared centre point, c(t). Figure 3 illustrates 
coordinates on the inner and outer contours using a small 
number of radii. In our implementation we use N = 128 
angular directions. 

We prefer to work in the discretised log-space of the 
radii, i.e. 



p n {t) = \M- r ini f log r n (t) J 



(2) 



where \_x\ is the floor function of x and ri n i t = 50 is exper- 
imentally chosen such that, for most segmentations in the 
training set, p n (t) ^ M/2 at end-diastole. The radius is 
discretised as p n e {0, . . . ,M— 1} where M = 256 pro- 
vides a resolution sufficient for human segmentation of 
the transformed image. One advantage of the log-space is 
that it provides a better spatial resolution at smaller radii. 

Figure 4 illustrates the log-polar transformed image, 
D(t) y of a single image, /(£). We denote image values in a 
radial direction (i.e. a row in the log-polar image), n, as the 
vector d n (t). The grayscale value of a pixel in the log-polar 
space is then referred to as d n (t, p). 

The two segmentation contours in a single frame are 
thus fully represented as two vectors of log-radii, 

P*it)={p£(t),...,p%_ 1 (t)} (3) 
p ou \t)={pr(t),...,p° N *(t)}, (4) 

around the centre point c(t). The segmentation of a video 
sequence of frames is represented by p = {p in ,/) out } 
around a series of centre points c = {c(t)} t=0} fT _i where 

p in = {p in (0),...,p in (T-l)} (5) 

pOut = r p out (0)f-- _ fp out (r _ 1) l (6) 
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Figure 4 Log-polar transform. Log-polar transform with human 


annotated inner and outer contours in yellow. The centre point is 


chosen near the middle of the blood pool. Ground truth inner and 


outer contours are indicated with solid yellow lines. 









Note that the position of the centre point is different for 
each frame. Partly due to the non-symmetrical contrac- 
tion of the heart, the ventricle centre point can undergo 
significant translation. 

Our segmentation process first determines a series 
of centre points, after which relative radial values are 
inferred. In the next section we turn our attention to 
finding suitable centre points. 

Centre point estimation 

Our CRF model requires the centre points to be placed 
near the middle of the inner contour allowing the model 
to restrict the spatial and temporal variability of the 
radii. As a semi-automatic clinical tool, a human oper- 
ator could annotate the centre point in every frame 
of the video sequence. In our work towards a fully- 
automatic technique we describe a semi-automated pro- 
cedure that requires only annotating the centre point in 
one frame, after which the other centre points are esti- 
mated. Fully-automatic techniques are especially valuable 
when analysing significantly large datasets. 

A number of heuristic techniques (e.g. [5]) are available 
for estimating the centre points. Most of these perform 
adequately when the papillary muscles are absent and 
there is high contrast between the blood pool and cardiac 
wall. This is generally the case for the first few frames, 
but not at the end of systole when the ventricle is at its 
smallest. 

The procedure described below requires the user to 
provide the centre point c(0) of the first frame when 
the left ventricle blood pool is clearly visible and the 
papillary muscles minimally obstruct the inner contour. 
The robustness against variations in c(0) is discussed 
in Section "Sensitivity to initial centre point placement". 
With an appropriate user interface the annotation of the 
center point in the first frame of all spatial slices would 
only take the user a few seconds. 



Since each video sequence from our dataset contains 
a single heart beat, and due to its periodic nature, we 
assume the last frame has the same centre point as the 
first, i.e. c(T — 1) = c(0). This is not a severe restriction 
since the algorithm is robust against variations in c(t), as 
demonstrated in Section "Sensitivity to initial centre point 
placement". 

To find centre points in the intermediate frames, 
c(l), . . . ,c(T— 2), we minimize a weighted between- 
frame alignment error, 

T-l 

error (c) £ w c{t) (p)-(l c{t) {Up) -f^ (t- 1,/>)) 2 , 
t=i p 

(7) 

where I c ^\t) is the image I(t) centred at c(t) such that 
I c ^ (t,p) = I (t,p — c(t)) and zero when indexed out of 
bounds, p is, again, the x, y coordinates and I(t) the frame 
at time t before a log-polar transform is applied. 

The weight w c ^ (p) = eH^-PlI 2 /* 2 ) i oca n y enhances 
the error around the frames centre point. The width, 
cr, is experimentally chosen from a training dataset so 
that the inferred centre points closely resemble the mean 
of the ground truth inner contours. For computational 
efficiency we only calculate values within a 2a radius 
of c(t) as the contribution becomes negligible further 
away. 

Note that (7) is a nonlinear function of the sequence of 
centre points, but is efficiently solved using an optimiza- 
tion strategy such as dynamic programming. In addition, 
in order to reduce computational cost a beam search is 
implemented, effectively assuming the centre point trans- 
lates less than three pixels between frames. Backtracking 
is initiated at the centre of the last frame c(T — 1). In 
order to constrain c(0) to the value provided by the user, 
during dynamic programming, the cumulative cost func- 
tion for the first frame is set to zero at that value and 1 
otherwise. 

We pose the centre point estimation problem as one 
that can be solved within the same belief propagation 
framework as our radial inference. However, since this is 
independent of the radial inference fast heuristic methods 
could be considered in the future. 

The CRF model 

Representing a temporal sequences of log-polar images as 
D = {£>(£) }f=o,...,r-i> an d assuming appropriately trained 
parameters 0 we model the conditional probability 
P(p\0,D) of a segmentation p. This is done through a 
log-linear CRF, 

P(p\O f D) = --^--exp(-E(p\O t D)). (8) 
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The energy E (p\0,D) is defined as the weighted sum of 
local feature functions/ defined over all cliques q e Q in 
its graphical model 



E(p\0,D) = J2°qfq(Pq> D )' 

qeQ 



(9) 



In this formulation, smaller energies indicate better seg- 
mentations, while bad segmentations are penalized with 
larger energies. 

The partition function 



Z(0,D) = ^exp(-£(p|0,D)) 



(10) 



sums over all possible configurations of p, normalizing the 
exponentiated energy into a probability. 

Note that there are no significant theoretical restrictions 
to the feature functions in a random field. Any further 
restrictions and choices of feature functions are design 
choices made to improve performance. 

Feature functions 

We restrict ourselves to positive feature functions/ where 
small values are more desirable (i.e. smaller values should 
correlate with "better" segmentations). Positive param- 
eters, 0, determine the relative weights of the feature 
functions. In order to minimize the clique size, we also 
restrict ourselves to features that couple at most two radial 
values in space and time, as computational cost grows 
exponentially with increasing clique size. 

A partial factor graph [17] in Figure 5 illustrates the tem- 
poral and spatial relationships between radius variables, p, 
in a single contour and rows in the log-polar transformed 
image, d n (t). The spatially circular nature (i.e. there is a 



feature connecting po(t) and pN-iif)) an d the relation- 
ships between the inner and outer contours are omitted 
for clarity. 

We derive the feature functions from discriminative 
properties of human annotated images as described 
below. Selection of the relative weights, 0, is discussed in 
Section "CRF parameter estimation". 

Feature function based on edge classifiers 

For a log-polar frame (such as in Figure 4) at time t, con- 
sider a window of height 1 and width w = ^ = 64 
around a radius p in the radial direction n. This win- 
dow is equivalent to a circular sector in the original 
image before the log-polar transform is applied. We 
refer to the pixel values in this window as the vector 
v p = d n (t, [p — . . . , p + ^]). A feature vector k (v) is 
derived from the window and is described below. 

We train an artificial neural network, with two nodes in 
a hidden layer, to model the presence of the cardiac edge. 
A window extracted from the training set is considered to 
contain an edge if the centre of the window is no more 
than 2 radial distances away from the ground truth edge, 
otherwise it is considered a "non-edge" training example. 

The feature vector, k (v), consists of the concatenation 
of four expressions of the gradient in the radial direc- 
tion, 0, that we suspect the classifier might find useful in 
discriminating between edges and non-edges, 



K(V) 



dv 

Yp 



dv 

dp 



.sign 



dv\ 



dv 



> 6 



(ID 



The expression > ej is a binary value indicating 

the presence of a gradient. 

From the short-axis view in Figure 1, it is worth noting 
that the gradient s sign on the left and right sides of the 
outer contours edge differ, due to the intensity of the right 



dn-l(t-l) 



dn-l(*+l) dn(*+l) d n+1 (t+l) 




Figure 5 Partial factor graph. A partial factor graph of the temporal and spatial relationships between radius variables, p, in a single contour and 
rows in the log-polar transformed image, d n (t). Factor labels and some variable labels are omitted for clarity. 
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ventricles blood pool. We therefore train eight classifiers 
for different parts of the contour, i.e. instead of training a 
single classifier over all angular directions (n = O.JV — 1) 
we treat groups of angles separately (n = 0, . . . , 15, n — 
16, . . . , 31 etc.) and thus train direction dependent classi- 
fiers. This allows the classifiers to exploit features that it 
might find relevant in that direction. 

We repeat the process for the classifiers of the inner 
contour's edges as the endocardium's edge behaviour also 
differs between the left and right sides of the ventricle due 
to the presence of the papillary muscles on mostly one side 
(see Figure 4). 

Heiberg [6] identifies the edges of the inner and outer 
contours as two classes: Concordant, where edge gra- 
dients have a similar sign, and Discordant, areas where 
the edge gradient sign differ. Accordingly, he explicitly 
includes the gradient sign for the inner contour and 
ignores the sign for the outer. Because our edge model is 
trained on annotated ground truth images, our classifier 
automatically differentiates between the signs when they 
are relevant to edge detection. 

To fit within the framework of energy minimization, the 
response of the neural network to an image is transformed 
into a cost by subtracting its output value from one. The 
minimum cost in the radial direction is subtracted to avoid 
negative feature values and is normalized by the sum. 

We construct these networks for the inner and outer 
contour edges and so derive the features/ 111 d n (t)) 

and/° ut (p° ut (*)> 

It should be noted that the typical neural network 
training techniques attempt to minimize a classification 
error, however we are not so much interested in the 
classification as using the output to penalise non-edges 
while at the same time minimally penalising edges. Effec- 
tively, we would prefer a classifier that rejects very few 
edges. To mitigate unintended penalisation, the outputs 
of the constructed features are "smoothed" by applying a 
small minimum-filter (erosion), in the angular direction. 
This will cause an area to have a high feature response 
(i.e. penalised) only if its neighbouring areas (in the angu- 
lar direction) also have high responses. 

Figures 6 and 7, respectively, show the resulting inner 
and outer features' responses to the frame in Figure 4. 
Note that the response is relatively low in the area of 
the ground truth. Recall that more desirable segmenta- 
tions have smaller feature values. It should be emphasized 
that these response images have been obtained without 
taking into account any temporal behaviour or continu- 
ity requirements. We now investigate how to incorporate 
these properties into our model. 

Spatial and temporal feature functions 

Figure 4 shows that there are strong gradients at some 
non-contour positions and relatively weak gradients at 



128 v 




Figure 6 Inner edge feature function. Response of the feature 
function for the inner edges to the image in Figure 5. The ground 
truth inner contour is indicated with a solid yellow line and the outer 
with a broken line. 



some contour positions, especially when papillary muscles 
are close to the endocardium border, resulting in the fea- 
ture responses of Figures 6 and 7. This is undesirable as 
inference from these features alone would cause incorrect 
segmentations in these areas. 

We now proceed to introduce spatial continuity, as 
well as temporal information directly into the model by 
introducing the following feature functions, 



fr(pn(t),p n -i(t)) = 



Pn(t) - p n -l(t) 



ft(pn(t),Pn (f-1)) 



/Pn 



M 



(t) - Pnit- 1) 

M 



(12) 



(13) 



where M is again the discretisation value, used here to 
scale the feature values to the same order of magnitude as 
the features described previously. From direct inspection, 
few contours violate the properties \p n (t) — p n (t — 1)| < 
25 and \p n (t) — p n -\(t)\ < 2. This can be exploited 
during inference by applying a beam search and thereby 
significantly reducing the search space. 



128 v 




Figure 7 Outer edge feature function. Response of the feature 
function for the outer edges to the image in Figure 5. The ground 
truth outer contour is indicated with a solid yellow line and the inner 
with a broken line. 
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We also detect and penalize contour growth dur- 
ing systole and shrinkage during diastole by assum- 
ing end-systole (maximum contraction) is reached at 
time t E s> 



ft(Pn(t),Pn (*-!)) = 



[pn(f- 1) < p n (t)] iff < t E S 

[p n (t) < p n (t — 1)] otherwise. 

(14) 



For simplicity, we have chosen a fixed t£s = 8 from 
inspection of the training annotations. The correct selec- 
tion of tES is sensitive to patient pathology. A more robust 
choice might depend on detecting when the optical flow in 
the images are suspended and reversed. A full discussion 
is, however, beyond the scope of this article. Again, these 
features are constructed separately for both the inner and 
outer contours. 

Additionally, we assume that the "angular gradient" of 
the cardiac wall just inside the outer contour remains 
small over time and space through, 

//' (p° Ut (0. PT (t - D) = \d n (t, P T\t) ~ 6p) 

- d n (t,p%*(t-l)-e p )\, 
(15) 



and 



fr (pT(tlPn-l(t)) = \dn (fcP^W " €p) 

-d n (t,p™\(t)-e i 



where e p = 2 is an experimentally chosen radial offset. 
Similar feature functions are constructed for the intensity 
just outside the inner contour. 

Inner-outer radius feature functions 

Information on the cardiac structure can further be 
exploited by using the relationship between the inner and 
outer contours. The ratio between the inner and outer 
radii, rjf(£)/r^ ut (£), (and thus the difference in log-space, 
|p^ ut (0 — P l n(t)\) is found to contain information on the 
temporal behaviour, as can be seen in Figure 8. This ratio 
is related to the wall thickness but is invariant to scaling, 
which can occur due to differences in patient physiology 
and MRI magnifications. 

A probability distribution of the log-radial distance 
between the inner and outer contours, P(p° ut (t) — 
p l „(t)), is derived from annotations in a training dataset 
and used to construct the feature function (see Figure 9), 



/r oss ceo, pT (o) = - iogp (p° ut (o - Pnd)) ■ 



.0 0.25 0.5 0.75 1. 




Figure 8 Inner-outer ratio histogram. Histogram of relationship 
between inner and outer radii, rf 7 n (f)/r° ut (f) / against time, generated 
from a training dataset. 



The relative homogeneity of the cardiac wall can also 
be exploited by minimizing the variance in intensity of the 
area between the inner and outer contours through, 

1 pTh) 

f™ SS (p™(t),p™\t),d n (t)) = - (dn(t,p)-^nf 



P=pf{t) 



(16) with the mean wall colour 



x Pn(t) 

P=pf(t) 

and the wall width 

W = p™\t) - p™(t). 



(18) 



(19) 



(20) 



(17) 
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Figure 9 Inner-outer feature function. Response of feature, ff ross : 
difference between inner and outer log radii, |p° ut (0 - P^(t)\ 
against time. 
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Segmentation 

Given a video sequence and trained parameters, we esti- 
mate the most probable radii from 

p* = argmaxPOo|0,D), (21) 
p 

or equivalently the radii with the smallest energy, 

p* = zrgminE (p\O f D). (22) 
p 

Belief propagation is a popular technique for inferring 
values for the latent variables in a graphical model Mes- 
sages representing cumulative belief are passed between 
variables and updated according to a specific order or 
schedule. Backtracking is used to recover a solution. For a 
tutorial on belief propagation see e.g. [17]. When applied 
to chains or tree structures, belief propagation is equiva- 
lent to dynamic programming, however our model con- 
tains many loops and thus requires the iterative loopy 
belief propagation. For more detail, specifically as applied 
to this type of edge model, see e.g. [15]. we have chosen 
a sequential propagation schedule which allows for faster 
convergence. 

Propagation schedule 

In a parallel propagation schedule all messages are 
updated simultaneously after each iteration. However, in 
our experience, the effects from a feature function prop- 
agate relatively slowly through the model if a parallel 
schedule is followed. This is in line with Goldberger and 
Kfir [18]. For this reason we have chosen a sequential 
propagation schedule which allows for faster convergence. 

For variables representing the inner contour, messages 
are first propagated in an angular direction (n = 0, . . . , 
N — 1) and reversed (n = N — 1, . . . , 0) before being prop- 
agated to the next temporal frame (t = 0, . . . , T — 1) and 
back (t = T — 1, . . . , 0). Similar steps are then repeated 
for the outer contour, taking into account messages passed 
from the inner contour. This process is repeated for three 
iterations. 

Propagating over the angular direction first places more 
emphasis on contour continuity than the other radial rela- 
tionships. The same reasoning is used in the selection of a 
backtracking order, as discussed below. 

Message normalization 

As features in an undirected graphical model are allowed 
to take on arbitrary values, propagated messages do not 
represent marginal probabilities, as is the case in directed 
graphs. If loopy belief propagation is applied without mes- 
sage normalization, numerical overflow can occur after 
only a few iterations. 

We normalize our messages by subtracting the smallest 
value in each message before it is propagated, effectively 
assuring that the minimum value in each message is zero. 



For a max-product setting this is equivalent to normaliz- 
ing each message so that the largest value in the message 
is one. 

Convergence and optima I ity 

We briefly discuss issues of convergence of message pass- 
ing and on the choice of an appropriate backtracking 
order. 

Belief propagation is guaranteed to yield a globally opti- 
mal result when applied to a tree structured graph [17]. 
It also converges to a stable fixed point (which is glob- 
ally optimal) or periodically oscillates when applied to 
graphs with a single loop [19]. Due to the temporal and 
inner-outer radius feature functions our graphical model 
contains many loops (Figure 5). 

Convergence for graphs with this many loops is not 
guaranteed, although if convergence is reached there are 
theoretical results regarding its optimality. Weiss and 
Freeman [19] describe the neighbourhood within which 
the result is optimal. 

Through experimentation we find that the inferred seg- 
mentation is sensitive to the order of backtracking. We 
choose to backtrack over all nodes in the inner and outer 
contours independently. We find that this increases the 
probability that the inferred inner and outer contours in a 
single frame form continuous loops. 

We have not observed divergence or significant oscilla- 
tion between configurations in our application. 

CRF parameter estimation 
Maximum likelihood estimation 

Before performing segmentation of a newly observed 
video sequence, suitable CRF model parameters 0 are 
needed. One popular approach is to search for parame- 
ters that maximize the likelihood of the training anno- 
tations, i.e. 

0* = arg max (p (i) \D (i) , O^J , (23) 
i 

where is a video sequence from a training dataset and 
p® is its human annotated segmentation. 

Often an iterative gradient-based method is used to find 
adequate parameters. Calculating the probability (or its 
derivatives) for a specific 0, requires evaluation of the par- 
tition function (or its moments). The partition functions 
derivative is given by, 




Dreijer etal. BMC Medical Imaging 201 3, 13:24 Page 9 of 23 

http://www.biomedcentral.eom/1 471-2342/1 3/24 



Here the sum is over all configurations of p which 
requires O (M 2A/T ) operations. This is one of the sig- 
nificant challenges in applying CRF/MRFs to practical 
problems as the complexity quickly becomes intractable 
in general 

Attempts have been made by others to approximate the 
partition function [20,21], which implies an approxima- 
tion to the original distribution P that we refer to as the 
distribution P. 

Apart from these computational difficulties, also note 
that the approximate nature of loopy belief propaga- 
tion causes inference to yield a segmentation that is the 
optimal configuration to a different probability distribu- 
tion (that we refer to as Q) than the one described in 
Section "The CRF model" (referred to as P). The relation- 
ship between these distributions is discussed by Weiss and 
Freeman [19]. 

The effect of using parameters that maximize the likeli- 
hood of the training data under P, to infer values from the 
distribution Q is unclear. 

However, since we are primarily interested in obtaining 
parameters that yield good segmentations under inference 
and less with estimating the "true" model distribution, 
we investigate an alternative that avoids calculating the 
partition function. 



Parameter estimation 

Consider a video sequence from a training dataset 
and its human annotated segmentation p®. We are inter- 
ested in obtaining parameters that would lead to a seg- 
mentation, p*( l \ of the video sequence, that does not 
significantly differ from the annotated segmentation, p®. 

Moreover, we wish to find parameters that minimise the 
errors over the entire training dataset 

8* = arg min * (p (i \ P* {i} ) J (25) 

where the inferred segmentation of a video sequence from 
loopy belief propagation is 

p* (i) = arg max Q (p |0, £> (/) ) . (26) 

The function ei is a measure of the differences between 
two segmentations (i.e. the ground-truth and inferred 
contours). The landmark error, i.e. the average of the 
shortest distance between each point on the ground 
truth contour and the inferred contour, is used by 
Andreopoulos et al [9] in their evaluation. Due to its non- 
symmetry, i.e. ei (p A , p B ) 7^ ei (p B , p A ),we have observed 
that inferred contours that minimize this error tend to 
be very jagged. For this reason we use the average of 
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the "ground truth to inferred" error and the "inferred to 
ground truth" error during training, i.e. — 



)+ei[p°>P 



Powells method [22] is a popular technique for search- 
ing for the local minimum of a function and is used here 
to find suitable parameters. Powells method avoids calcu- 
lating a gradient through a bidirectional line search along 
a vector in a list of vectors. The list is updated by the 
displacement after each improving iteration. In our case 
convergence requires about 300 iterations. 

To compensate for the approximate nature of inference 
and avoid calculating the partition function, a gradient- 
free approach is thus followed where the segmentation 
error is treated as a "black box". This allows us to integrate 
the approximate inference process into the training stage. 

Implementation 

The majority of the software is implemented in the Python 
programming language with belief propagation imple- 
mented in C. Cordero-Grande et al. [13] reported segmen- 
tation of a single 4D video sequence in approximately 56 
minutes on a single 800MHz CPU with 4MB cache from 
their MRF. For a similar number of images (12 slice posi- 
tions with 20 frames each) on a single 3400MHz CPU with 
8MB cache we can report a radial segmentation time of 
~2 minutes. We have not included the time to estimate 



the centre point positions, which is approximately an 
additional minute for each 20 frame video sequence. 

Our radial inference technique is thus approximately six 
times faster (taking into account the faster CPU), which 
can be attributed to their use of a Gibbs sampler. Note 
also that the number of radial points we use to represent 
a contour and its radial discretisation are 128 and 256, 
respectively. This is significantly larger than the values of 
7 and 31 used by Cordero-Grande et al, especially when 
comparing the size of the resulting configuration space. 

Results 

In this section we analyse the segmentations produced by 
our process and compare them to those of a few exist- 
ing techniques. The reader is urged to view the video 
sequences in Additional file 1 or on our website at http:// 
dip.sun.ac.za/~janto. Additional file 2 contains the auto- 
matic segmentations of all end-systole and end-diastole 
frames for the Sunnybrook evaluation dataset. 

We evaluate our model on two datasets. Our segmen- 
tation process is trained and analysed on the York dataset 
[9] with respect to segmentation behaviour and its sensi- 
tivity to placement of the initial centre point. This dataset 
contains ground truth annotations for all frames and 
therefore contains important temporal information. For 
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Figure 1 1 Bad quality York automatic segmentations. Selection of incorrectly segmented contours (inner contour is blue and outer is c 
inferred from a testing set. The blue dot in the middle of the blood pool is the estimated centre point. 
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Figure 12 Disappearing blood pool. Incorrect automatically segmented contours (inner contour is blue and outer is green) of Subject 8 due to a 
disappearing blood pool. The blue dot in the middle is the estimated centre point. 



comparison with other authors our technique is evaluated 
on the Sunnybrook [5] MRI dataset. This dataset is not 
annotated for all frames, but has the advantage that more 
authors have used this set to report their results. 

In addition to the landmark distance, we use two other 
similarity measures that have gained widespread use: the 
Dice metric and Average Perpendicular Distance (APD). 
The Dice similarity represents the percentage of overlap 
between two segmented surfaces and the Dice error is the 
percentage of non-overlap, i.e. one minus the Dice simi- 
larity. The APD is calculated as the average of the perpen- 
dicular distance from each point on the reference contour 
to the target contour. The APD is therefore similar to the 
non-symmetric landmark distance. 

York cardiac segmentation dataset 

In this section we evaluate our model on the MRI York 
dataset [9] provided by the Department of Diagnostic 
Imaging of the Hospital for Sick Children in Toronto 
and annotated by Andreopoulos of York University. The 
dataset contains video sequences from 33 subjects, all 
under the age of 18, displaying a variety of heart abnor- 
malities such as cardiomyopathy, aortic regurgitation, 
enlarged ventricles and ischemia. We split the dataset 
into three cross-validation sets: 11 subjects for training 



of the edge classifier, 11 subjects for CRF parameter esti- 
mation and 11 subjects for evaluation. Each set effectively 
contains approximately 100 video sequences and each 
video contains 20 frames at different z-axis slice positions. 
The inner and outer contours are manually annotated 
for all frames and are used as the ground truth in our 
experiments. 

For efficiency we only use video sequences of the sixth 
z-axis slice of each patient to estimate the CRF parame- 
ters. These slices do not necessarily coincide with the mid 
ventricle for all patients. The number of slices where the 
ventricle is visible differs between patients and is likely 
dependent on patient pathology. A different choice of 
training data is thus likely to have a significant effect on 
the results of patients with, for example, hypertrophy. 

Segmentation analysis 

Figure 10 illustrates our results on a selection of images 
from the testing dataset. A visual inspection indicates 
that, for the majority of video sequences, our automated 
annotations are in line with expected behaviour with 
regard to shape, position and motion. Of particular inter- 
est is the inclusion of the papillary muscles inside the inner 
contour. In many of these cases the segmentation pro- 
cess is able to use local shape and temporal behaviour to 



Table 1 Comparison of contour errors 



Authors 


Technique 


Inner contour 


Outer contour 






error [mm] 


error [mm] 


Our method 


CRF 


1.57 


1.78 


Our method (without subjects 8 and 27) 


CRF 


1.49 


1.74 


Andreopoulos and Tsotsos [9] 


AAM 


1.43 


1.51 


Uzumcu [23] 


Landmark tracking 


1.86 


1.77 


Jolly [14] 


Shortest path 


2.44 


2.05 


Cordero-Grande et al. [1 3] 


Edge MRF 


1.37 


1.22 


Lorenzo-Valdes et al. [24] 


Surface MRF 


2.99 


2.21 



Average segmentations errors as reported by different authors. These results are not strictly comparable since they are based on different datasets and the error 
criteria differ. Refer to the individual papers for more detail. 
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identify the inner contour even though the edge is weak or 
absent. 

We also observe robustness to some noisy images. 
These examples suggest that in some of these images 
where the automatic segmentation differs from the 
ground truth, the automatic segmentation is superior to 
the manual approach. This is attributed to the fact that the 
automated system is able to integrate temporal behaviour, 
something that is an arduous task for a human. 



Figure 11 contains a selection of images that are 
incorrectly segmented. These are primarily due to a disap- 
pearing blood pool, centre point initialization outside the 
endocardium, or low contrast between the endocardium 
and cardiac wall. 

On further inspection of the York dataset, the blood 
pool disappears from view in some video sequences of 
Subject 8 (refer to Figure 12) which is diagnosed with ven- 
tricular hypertrophy (enlarged cardiac wall thickness). It 
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0.01 1 1 1 1 1 

0 5 10 15 20 

time [frame number] 

Figure 15 Inner Dice errors overtime. Inner contour Dice errors overtime. For illustrative purposes, a random real value between zero and one 
was added to each frame number. The geometric mean for each frame number is indicated by a black line. 



is also noted by the annotator of the dataset [9] that this 
yields bad segmentations in their work. Images of Subject 
27 also has relatively low contrast between the endo- 
cardium and the cardiac wall. We therefore regard Subject 
8 and 27 as outliers and remove them from the dataset. 
This significantly improves inner contour accuracy as 
indicated in Table 1. 



The areas inside the contours of the automatic 
annotations are plotted against the areas inside the human 
annotated ground truth in Figures 13 and 14. For small 
inner contours our technique often yields segmentations 
larger than the ground truth. This can be attributed to 
the automated segmentations being more "inclusive" of 
the papillary muscles, which can significantly effect small 
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0.01 1 1 1 1 1 1 1 

0 2 4 6 8 10 12 14 

slice depth 

Figure 17 Inner Dice errors over slices. Inner contour Dice errors over different slices. For illustrative purposes, a random real value between zero 
and one was added to each slice depth. The geometric means for the slice positions are indicated by the black line. 



contours. Our technique also provides slightly smaller 
outer contours. A comparison with the ground truth indi- 
cates that our segmentation is often temporally smoother. 
This is attributed to the human annotator segmenting one 
frame at a time, and thereby largely disregarding temporal 
behaviour. 

Figures 15, 16, 17 and 18 show frame Dice errors 
against time and slice position for the inner and outer 



contours. The vertical axes of these graphs are logarith- 
mically scaled. The geometric means (arithmetic mean 
in the log-scale) in these figures are indicated by solid 
black lines. We observe that the majority of incorrect 
segmentations occur during end-systole (t & 8) and spa- 
tially lower slices (depth > 9) where the blood pool is 
at its smallest (sometimes completely disappearing from 
view). In these frames papillary muscles are most visible, 
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obscuring not only the border, but a significant part of the 
endocardium. 

Table 1 contains segmentation errors of the inner and 
outer contours as reported by a selection of different 
authors. We report our results as the landmark error 
[9], i.e. the average of the shortest distances between the 
points on the ground truth contour and the inferred con- 
tour. These results are, however, from different datasets 
and there are also subtle, but important, differences in 
the error criteria. Refer to the individual papers for more 
detail. This table should therefore only be used as a rough 
comparison to other techniques. For a more comprehen- 
sive summary of reported errors see [25] . The Sunnybrook 
dataset is used in a later section to more fairly compare 
our technique to others. 

Sensitivity to initial centre point placement 

Figure 19 illustrates the sensitivity of the segmentation to 
incorrect placement of the initial centre point, c(O). For 
each video sequence, an initial centre point is generated 
at a fractional distance, d, between the ground truth cen- 
tre and a randomly selected point on the ground truth 
inner contour. The CRF parameters are not retrained on 
these imperfectly placed centre points, which would pos- 
sibly allow the system to weigh shape information less and 
thus improve results. 

As can be seen from Figure 19, the segmentations 
of the inner and outer contours remain relatively sta- 
ble while c(0) is within the inner 20% of the endo- 
cardium. When placed at approximately 50% between the 



ground truth centre point and inner contour, the spa- 
tial continuity assumption of (12) is violated enough that 
the quality of the inferred contours begin to deteriorate 
significantly. 

Sunnybrook cardiac segmentation dataset 

The Sunnybrook Cardiac MR Database [5] is provided 
by the Sunnybrook Health Sciences Centre and was used 
for the 2009 MICCAI Cardiac MR Left Ventricle Seg- 
mentation Challenge. The dataset contains 45 subjects, 
with an average age of 61, with diverse morphologies 
(Heart failure with and without infarction, LV hypertro- 
phy, and healthy subjects) and is manually segmented by 
a cardiologist. The inner contours are annotated only at 
end-diastole and end-systole, while the outer contours are 
annotated only at end-systole. A ground truth segmenta- 
tion of the intermittent frames is therefore not available, 
making extraction of temporal information for this dataset 
difficult. 

Due to the sparsity of annotations in this dataset, all 
feature functions are re-used as derived from the York 
dataset. The model used to segment the Sunnybrook 
data therefore includes features derived from the trained 
edge classifiers, temporal behaviour and inner-outer rela- 
tionships of the York dataset. Only the CRF parameters 
(i.e. the relative importance of the features) are retrained 
on a training subset of the Sunnybrook data. To further 
compensate for the relatively few examples, and thus avoid 
over- fitting, a very weakly weighted LI -norm parameter 
regularization term |log0^|) is added to the objective 



10 



Q) 6 
C1J 



o o average inner contour error 
• • average outer contour error 



o 

o • 



• o 



o o o o u 

!«:::• 



0.0 



0.2 



0.4 



0.6 



0.8 



1.0 



Figure 19 Centre point robustness. Sensitivity of segmentation with regard to incorrect centre point in first frame. The distance d is the fractional 
position between the ground truth centre point and the ground truth inner contour. See Section "Sensitivity to initial centre point placement" for 
more detail. 



Dreijer et al. BMC Medical Imaging 201 3, 1 3:24 
http://www.biomedcentral.eom/1 471-2342/1 3/24 



Page 16 of 23 



function. Since this term is applied to the log of the param- 
eters, it effectively penalizes the specialization on features 
by the optimizer. A detailed analysis of the effects of 
parameter regularization in our application is beyond the 
scope of this article. 

A notable difference from the York dataset is the pres- 
ence of images of patients with heart failure, where the 
cardiac wall is exceptionally thin, as indicated by the yel- 
low arrows in Figure 20. To compensate we modified the 
belief propagation beam search by reducing the minimum 
allowed difference between the inner and outer variables 
(i.e. wall thickness) from 10 to 2. We also reduced the 
radial offset used in the features described by (15) and (16) 
from e p = 2 pixels to e p = 1, to adequately capture the 
wall colour when very thin. 

This modification suggests that there are important 
model parameters that are dependant on the patient 
pathology. A practical segmentation tool could allow the 
operator the option to provide a prior diagnosis or more 
fine grained control over some settings. 

During training of the edge classifier from the York 
dataset, it was assumed that the extracted radial win- 
dow, v pt contains an edge if the annotated edge is within 
two radial distances from the middle of the window. 
This assumption is problematic if used to train the edge 
classifier on the Sunnybrook dataset. A small wall thick- 
ness, as is common in this dataset, would cause both 
the inner and outer contours to fall within this crite- 
ria, resulting in inconsistencies in the training examples 
and thus weakening the resulting edge detector. To train 
a classifier on this dataset it would thus be necessary 
to be more strict with regard to the minimum radial 
distance. 

Segmentation analysis 

Figure 21 contains a selection of images from this dataset 
and their automatic segmentations. See also Additional 



file 2. A qualitative examination of the results suggests 
that segmentations are generally of good quality; the 
papillary muscles are included even if they obscure the 
endocardium border. 

Bland-Altman plots of end-diastole volume, end- 
systole volume, ejection fraction and mass are shown in 
Figures 22, 23, 24 and 25, respectively. The end-diastole 
volumes, as predicted by our technique and as annotated 
by the human specialist, agree with a small bias and vari- 
ance (—3.35 zb 7.61 ml). End-systole volumes agree with 
a small bias, but a relatively large variance (1.75 zb 20.21 
ml). This leads to a relatively small bias, but relatively 
large variance in the agreement of the calculated ejection 
fractions (—4.66 zb 10.73%). The calculated left ventricle 
mass has a small bias and variance (—0.95 zb 11.58 g). 

The end-diastole volume in Figure 22 and Bland-Altman 
plot of end-diastole inner contour area in Figure 26 
also illustrate the algorithms tendency to yield contours 
smaller than the ground truth at end-diastole. The Bland- 
Altman plot of end-systolic volume in Figure 23 and 
end-systolic inner contour area in Figure 27 illustrate 
that small volumes are overestimated and large volumes 
underestimated. 

Table 2 contains a summary of Dice errors and 
APD as reported by various authors on the Sunny- 
brook dataset during the challenge [5] including our 
results, before and after parameter retraining. Prior to 
retraining, results are comparable but slightly worse 
than the top performing challenge entries. After param- 
eter re-estimation on the training subset our results 
outperform the entries on the evaluation set in terms 
of the inner contours' Dice metric. Our average Dice 
similarity of the outer contours is comparable to the 
best performing entries in the challenge. Our average 
APD for the inner and outer contours are equal to 
or smaller than reported by any of the authors in the 
challenge. 
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Figure 21 Sunnybrook automatic segmentations. Selection of images and their automatically segmented contours (inner contour is blue and 
outer is green). The blue dot in the middle of the endocardium is the estimated centre point. 



Table 3 provides a more detailed report on the result- 
ing segmentations of the patients in the evaluation set, 
as generated by the evaluation code provided with the 
dataset. The table also indicates the percentage of good 
contours for each subject, i.e. those with an APD smaller 
than 5 mm. 

Our segmentation process performs well on those 
subjects in the evaluation set with normal heart func- 
tion (SC-N) and those with heart failure with (SC-HF-I) 
and without infarction (SC-HF-NI). Our process performs 
worst on the inner contour of patients in the evaluation set 
with hypertrophy (SC-HYP) as is illustrated in Figure 28. 



This effect is also observed in other models by other 
authors [25]. 

Discussion 

In this article, the CRF parameters are estimated using 
a gradient-free search approach. The time spent search- 
ing for parameters can be further reduced by considering 
only a subset of the training videos. Due to the relatively 
low dimensionality of the parameter search space 14 
dimensions) and relatively informative nature of each 
video sequence for all parameters, data scarcity is not a 
problem. It is, however, advisable that the training set 
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Figure 22 Bland-Altman plot of EDV. Bland-Altman plot of automatically determined minus ground truth end-diastole volume including mean 
difference and standard deviation lines. 
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Figure 23 Bland-Altman plot of ESV. Bland-Altman plot of automatically determined minus ground truth end-systole volume including mean 
difference and standard deviation lines. 



contains a sufficient number of video sequences in which 
the papillary muscles obscure the endocardium border. 

Note that no image preprocessing is done to compen- 
sate for effects such as different intensity settings on the 
MRI equipment. Image equalisation before segmentation 
would likely result in more robust result. 

Inclusion of spatially neighbouring slices into a uni- 
fied 3D and time model might also increase segmentation 



accuracy. Information at spatially higher slices, where the 
papillary muscles are less problematic, could then be used 
to improve the accuracy at lower slices. Linking the radial 
values between different slices would be relatively simple. 
This can be done with appropriate feature functions simi- 
lar to those restricting radial continuity in a single contour. 
This would, however, require aligning slices to compen- 
sate for translation caused by different levels of inhalation 
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Figure 24 Bland-Altman plot of EF. Bland-Altman plot of automatically determined minus ground truth ejection fraction including mean difference 
and standard deviation lines. 
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Figure 25 Bland-Altman plot of LV mass. Bland-Altman plot of automatically determined minus ground truth LV mass including mean difference 
and standard deviation lines. 



and expiration between slices. It should also be simple 
to extend the segmentation process to include these 
additional features by adding them to the propagated 
messages. 

Ideally, the time of end-systole in (14) should not be 
decided before inference, however, this might require 
a second order CRF or modelling as an additional 



unobserved variable. This would lead to an increase in 
segmentation time. Alternatively, techniques based on 
detecting temporary suspension and reversal of optical 
flow could be useful in detecting end-systole. 

A second order system would also make the incorpo- 
ration of contour smoothness information possible, as 
currently only contour continuity is taken into account. 
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Figure 26 Bland-Altman plot of ED area. Bland-Altman plot of automatically determined minus ground truth end-diastole area including mean 
difference and standard deviation lines. 
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Figure 27 Bland-Altman plot of ES area. Bland-Altman plot of automatically determined minus ground truth end-systole area including mean 
difference and standard deviation lines. 



Alternatively, post-processing of the resultant contours by 
fitting them to splines, would improve smoothness. 

From a visual inspection of the ground truth it is clear 
that there are inconsistencies in the human annotations 
with regard to the inclusion of the papillary muscles in 
the inner contour. These inconsistencies reduce the dis- 
criminative ability of the edge classifier and influences the 
optimal CRF parameter values estimated during training. 
Also, because the human annotated contours are used 
for evaluation, inconsistencies of the human annotations 
need to be taken into account when interpreting any 
results based on this as ground truth. In short: incon- 
sistent examples in the training and evaluation set will 



result in an upper limit to the accuracy achievable by any 
consistent system. 

Conclusion 

We present a CRF implementation for the automated seg- 
mentation of the left ventricle. Features are derived from 
discriminative properties of a human annotated dataset. 
The algorithm exhibits robustness against inclusion of 
the papillary muscles by integrating shape and motion 
information. 

Experiments on the Sunnybrook dataset suggests that 
our technique would provide segmentations superior to 
those reported in the challenge. 



Table 2 Comparison of Sunnybrook Dice errors 



Authors 




Dice similarity 




APD (mm) 






Inner 


Outer 


Inner 




Outer 


Our method (trained on York) 


0.87 


0.92 


2.70 




2.23 


Our method (after retraining) 


0.91 


0.93 


1.84 




1.95 


Maraketal. [26] 


0.86 


0.93 


2.6 




3.0 


Lu et al. [27] 


0.89 


0.94 


2.07 




1.91 


Wijnhoutetal. [28] 


0.89 


0.93 


2.29 




2.28 


Casta etal. [29] 




0.93 






2.72 


O'Brien etal. [30] 


0.81 


0.91 


3.73 




3.16 


Constantinides et al. [31] 


0.89 


0.92 


2.35 




2.04 


Huang S. etal. [32] 


0.89 


0.94 


2.10 




1.95 


Jolly [14] 


0.88 


0.93 


2.44 




2.05 



Average Dice similarity metric and Average Perpendicular Distance (APD) for our segmentation of the Sunnybrook validation set (before and after training) and those 
reported by the different challenge entries. 
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Table 3 Sunnybrook results 



Patient Good (%) APD (mm) Dice similarity 





Inner 


Outer 


Inner 


Outer 


Inner 


Outer 


SC-HF-l-05 


100 


100 


1.52 


1.86 


0.94 


0.95 


SC-HF-l-06 


100 


100 


1.66 


1.45 


0.92 


0.95 


SC-HF-l-07 


100 


100 


2.37 


2.79 


0.89 


0.90 


SC-HF-l-08 


95 


100 


1.68 


1.37 


0.93 


0.96 


SC-HF-NI-07 


100 


100 


2.21 


2.20 


0.91 


0.93 


SC-HF-NI-11 


100 


100 


1.70 


1.25 


0.93 


0.96 


SC-HF-NI-31 


100 


100 


2.06 


1.58 


0.91 


0.95 


SC-HF-NI-33 


94 


100 


1.68 


1.64 


0.91 


0.94 


SC-HYP-06 


92 


100 


1.67 


2.05 


0.90 


0.92 


SC-HYP-07 


69 


100 


1.39 


1.83 


0.93 


0.94 


SC-HYP-08 


68 


100 


2.27 


2.33 


0.90 


0.93 


SC-HYP-37 


69 


71 


2.21 


2.44 


0.86 


0.91 


SC-N-05 


93 


100 


1.67 


2.45 


0.89 


0.89 


SC-N-06 


100 


86 


1.76 


2.12 


0.89 


0.91 


SC-N-07 


100 


100 


1.79 


1.95 


0.88 


0.90 


mean 


92 


97 


1.84 


1.95 


0.91 


0.93 


std deviation 


12.4 


8.0 


0.30 


0.44 


0.02 


0.02 


Patient specific Average Perpendicular Distance (APD), and Dice similarity between annotations and ground truth of the Sunnybrook validation set. 



The most significant segmentation errors are present 
in images of patients with hypertrophy, when the blood 
pool disappears from view. This limitation is due to the 
assumption that the inner contour is present in each 
frame. Future work could address these failures, possibly 
through the modelling of the right ventricles center point 



to avoid the LV outer contour from snapping to the outer 
edge of the entire heart structure. 

Additional modelling of the right ventricle would also 
be beneficial, but is complicated by the polar coordinate 
space formulation, which allows only for one center point 
per frame. 




Figure 28 Bad quality Sunnybrook automatic segmentations. Selection of images from validation patient images with hypertrophy and their 
low quality automatically segmented contours (inner contour is blue and outer is green). The blue dot in the middle of the endocardium is the 
estimated centre point. 



Dreijer et al. BMC Medical Imaging 201 3, 1 3:24 
http://www.biomedcentral.eom/1 471-2342/1 3/24 



Page 22 of 23 



Future work could also include faster and more robust 
centre point estimation. As mentioned previously, fram- 
ing the centre point estimation in a model that can be 
solved with dynamic programming allows us to formulate 
its optimization as a belief propagation algorithm. This 
has the advantage that the centre point can be estimated as 
an additional latent variable in our CRF model, that needs 
to be inferred. Alternating between inferring p and c is 
thus theoretically possible, although insufficient research 
has been done as to its practical value. 

Additional files 



Additional file 1 : Automatically segmented videos. A video containing 
automatically annotated sequences from different patients for the York 
dataset. 

Additional file 2: Large Sunnybrook montage image. Montage of all 
end-systole and end-diastole automatic segmentations for the 
Sunnybrook dataset. 
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